Virtual machine object version control

ABSTRACT

A method for indexing virtual machine version snapshots in a virtualization environment commences upon receiving a request (e.g., from an administrator or agent) to initiate a virtual machine version snapshot operation on a subject virtual machine. Processes within or controlled by the subject virtual machine are requested to temporarily suspend transactions and file I/O. When the processes that have been requested to temporarily suspend transactions and file I/O acknowledge quiescence, the method continues by generating a virtual machine version snapshot data structure. An entry in an index is formed from the virtual machine version snapshot data structure. Multiple instances of virtual machine version snapshot data structures can be stored in the index, and the index can be queried to determine the state that a virtual machine had at any of the snapshotted moments in time.

CROSS REFERENCE TO RELATED APPLICATIONS

The present application is a continuation application of U.S. patentapplication Ser. No. 15/060,140, filed on Mar. 3, 2016, which is herebyincorporated by reference in their entirety.

FIELD

This disclosure relates to file system content data protection, and moreparticularly to techniques for virtual machine object version control ina virtual machine environment.

BACKGROUND

Many computing clusters combine reconfigurable processing resources(e.g., single- and multi-core processors in a network or mesh) withreconfigurable resources (e.g., virtual machines, file systems,databases, backup storage, etc.). Many deployments of such computingclusters employ virtual machines that provide computing and storageservices to applications. Certain low-level computing and low-levelstorage capabilities are provided natively by a hypervisor that runs aparticular operating system (e.g., a Microsoft operating system).Unfortunately, reliance on the hypervisor and the set of limitationsimposed by the underlying operating system imposes severe limitationspertaining to maintaining versions (e.g., snapshots) of certain of thesereconfigurable resources. In particular there are many scenarios wherethe aforementioned applications need storage services that are notprovided by the underlying operating system to the degree needed by theapplications and/or by system administrators. For example, when anapplication or system administrator needs to create, store and managemany versions of a file (e.g., snapshots of a file), the limitations ofthe underlying operating system (e.g., limitations attendant to staticallocations, limitations attendant to the number of concurrently storedversions, etc.) prevent the application or system administrators fromimplementing desired behaviors, such as enabling fast user searches overa set of stored snapshots in order to identify one or more snapshotsthat contain a particular searched-for text string.

File system filers that have support for file-level version controleither keep the versioning metadata as part of the file system hostingthe files or, they keep the versioning metadata in a separate storagevolume on the same system. In this legacy scenario (e.g., when keepingthe versioning metadata as part of the file system), two acutelimitations emerge: (1) the versioning metadata is limited to onlyattributes pertaining to the file system hosting the files, and (2)there is an operating system-dependent limit on the number of versionsthat can be supported due to space constraints associated with a filevolume. In the case of keeping the versioning metadata in a separatestorage volume, the many versions of the file are stored in the separatestorage volume, and the storage volume grows super-linearly in size withthe number of versions of the files being created and maintained. Theperformance of accesses to the versioning metadata and correspondingversioned data degrades over time due to fragmentation and otherundesirable effects that result from indexing a large number of filesand/or file changes. Performance degradations are further exacerbatedwhen a single persistently-stored storage object is accessed by two ormore systems since the size of the index grows super-linearly in sizewith respect to the number of systems and their respectivesystem-specific attributes, as well as growing super-linearly in sizewith respect to the number of versions of the files being created.

The problems associated with the aforementioned legacy approaches areexacerbated for systems that implement virtual machines. A virtualmachine often possesses traits that are not present in files, and thusnot present in file snapshots. For example a virtual machine possessesnot only a storage footprint (e.g., pertaining to storage objects inuse), but also virtual machines possess a state footprint (e.g.,pertaining to computing resources in use and/or any forms of statevariables). Merely storing snapshots that comprise versions of storageobjects in use (e.g., files) fails to account for versioning of thestate information pertaining to a particular virtual machine. Inasmuchas the legacy approaches fail to account for versioning, stateinformation of virtual machines also means that the legacy approachesfail to provide an index or other mechanism to facilitate systemadministrative uses or other access to an index that comprises virtualmachine snapshot versions. What is needed is a way to provide virtualmachine version querying using an index comprising many snapshots ofversioned state information derived from snapshotted virtual machines.

What is needed is a technique or techniques to improve over legacyapproaches.

SUMMARY

The present disclosure provides a detailed description of techniquesused in systems, methods, and in computer program products for virtualmachine object version control in addition to native operating systemfunctions in a virtual machine environment.

In one embodiment, a method for taking virtual machine object versionsnapshots in a virtualization environment commences upon receiving arequest (e.g., from an administrator or agent) to initiate a versionsnapshot operation on a subject storage object. Processes (e.g., virtualmachines) on two or more systems that have the storage object open arerequested to temporarily suspend processing. When the processesacknowledge quiescence, the method continues by traversing virtualmachine data structures so as to generate a virtual machine object. Anindex is formed from the virtual machine object wherein the indexcomprises metadata that is received from or derived from informationoriginating from any number of virtual machines and/or their respectiveresources and/or any number of systems.

Further details of aspects, objectives, and advantages of thetechnological embodiments are described herein and in the followingdescriptions, drawings, and claims.

BRIEF DESCRIPTION OF THE DRAWINGS

The drawings described below are for illustration purposes only. Thedrawings are not intended to limit the scope of the present disclosure.

FIG. 1A1 depicts an environment including a filer.

FIG. 1A2 depicts a multi-node computing environment including a filervirtual machine.

FIG. 1A3 depicts an environment where virtual machine object versioncontrol is provided.

FIG. 1A4 depicts an environment where virtual machine object versioncontrol is provided for two or more systems.

FIG. 1B1 and FIG. 1B2 present depictions of an index that is embodiedusing a virtual disk hierarchical path layout used in systems thatprovide virtual machine object version control in addition to nativeoperating system file system functions, according to an embodiment.

FIG. 1C depicts a metadata layout to implement a volume index as used insystems that provide virtual machine object version control in additionto native operating system file system functions, according to anembodiment.

FIG. 1D depicts a volume index generation technique as used in systemsthat provide virtual machine object version control, according to anembodiment.

FIG. 2 depicts volume index maintenance techniques as used in systemsthat provide virtual machine object version control in addition tonative operating system file system functions, according to anembodiment.

FIG. 3A depicts a snapshot version management technique as used insystems that provide virtual machine object version control in additionto native operating system file system functions, according to anembodiment.

FIG. 3B depicts a snapshot version management technique using anexternal service to provide virtual machine object version control inaddition to native operating system file system functions, according toan embodiment.

FIG. 4 depicts messages and operations in a protocol as used in systemsthat provide virtual machine object version control in addition tonative operating system file system functions, according to anembodiment.

FIG. 5 depicts an example inode referencing technique as used in volumeindex layouts that facilitate virtual machine object version control inaddition to native operating system file system functions, according toan embodiment.

FIG. 6 depicts an example snapshot version management user interface asused in systems that facilitate virtual machine object version controlin addition to native operating system file system functions, accordingto an embodiment.

FIG. 7 depicts an example version snapshot restore procedure, accordingto an embodiment.

FIG. 8A, FIG. 8B, and FIG. 8C depict system components as arrangementsof computing modules that are interconnected so as to implement certainof the herein-disclosed embodiments.

FIG. 9A and FIG. 9B depict architectures comprising a collection ofinterconnected components suitable for implementing embodiments of thepresent disclosure and/or for use in the herein-described environments.

DETAILED DESCRIPTION

Some embodiments of the present disclosure address the problems wherenative operating systems fail to offer storage management facilitiessufficient to handle file version management requirements in virtualmachine clusters. Some embodiments are directed to approaches fordeploying a filer virtual machine such that the virtual machine versionstorage management is handled by a version management agent rather thanby the underlying operating system.

Overview

A filer virtual machine (filer VM) can be implemented as a virtualmachine. Any virtual machine accessible by the filer VM can besnapshotted at any moment in time. Any snapshot of any such virtualmachine can be persisted as a file, and can be accessed by the filer VMas a virtual disk. Each virtual disk can be formatted as a journaledfile system. For each of such journaled file systems, a file index ismaintained in memory (e.g., see FIG. 1B1), which can be periodicallypersisted to disk. The index comprises a wide range of virtual machineattributes and virtual machine attribute values that pertain to theaforementioned virtual machine. Such attributes include aspects of thesnapshotted state variables (e.g., memory state, peripheral state, etc.)of the virtual machine. The index further comprises a wide range ofvirtual machine attributes and virtual machine attribute values thatpertain to files (e.g., files that can be mounted as virtual disks).Such an index can be maintained as a hash table so that lookup queriespertaining to the attributes and/or attribute values can be satisfiedwithin O(1) time. The index can further comprise a list of versions,either by including a list of version identifiers in the index itself,or by including a reference to another location where a list of versionidentifiers are stored. Accesses to the index can be performed by anycomputational element using a set of function calls to handle file addevents and file delete events (see FIG. 2). Such functions operated overthe index can be performed in O(1) time, regardless of the size of theindex or number of files under versioning control or number of versionsthat are accessible.

In a virtual machine environment, in particular where a filer isimplemented using a virtual machine, an agent runs in the filer virtualmachine. The agent can be controlled by software using any means suchas, for example, using a remote process communication (RPC) interface.The agent can take on various responsibilities, possibly includingmanaging processing within the environment so as to bring the filesystem to a state of quiescence before taking a version snapshot of thevirtual disk that hosts the file system.

Snapshotting Events

Version snapshots can be scheduled to occur at a particular time, or ina periodic time pattern, or a version snapshot can be initiated at anymoment in time responsive to an asynchronous command (see FIG. 3).Consistency within a particular version snapshot is facilitated by anagent that communicates with processes that have the corresponding filesopen (e.g., for read access or for read-write access). Morespecifically, when the time comes for a version snapshot to be takenover a particular file, (e.g., on a schedule, or on the basis of anasynchronous command) an agent will issue a quiescence request to allprocesses that have the file open. In some cases, a scheduler can issuea “take snapshot” command to the agent (e.g., with a snapshot UUID andan expiry time). In response to the occurrence of the scheduled time orasynchronous command, or any “take snapshot” command, the agent canrequest quiescence of all I/O (henceforth, input/output or IO) from allprocesses. Upon receiving an acknowledgement from the processes, theagent can initiate operations that serve to dump the index together withassociations with or to the snapshot version created. The dumped indexis saved in a quarantined area (e.g., in a separate file, or in a tablein a special directory). This process can be repeated each time aversion snapshot version is created (e.g., each time a snapshot blocklist is saved). The process can be repeated for each virtual diskhosting the user data. To facilitate fast access to file snapshots, anindex in the form of a hash table is maintained in memory. The hashtable stores a mapping from a file name or file path to a set ofavailable snapshots (e.g., path →{S1, S2 . . . Sm}). This fast accessorganization can be further improved by maintaining a hash table ofinode identifiers. Logic is provided to handle the case where a pathmight be added (e.g., creating a file), then deleted (deleting thefile), and then re-added (in which case the inode identifier mightchange).

Fast Lookup Implementation and Uses

The foregoing provides fast two-step lookup from a path and filename toan available version snapshot, and on to the inode corresponding to anavailable version snapshot, which in turn comprises a snapshot blocklist. As such it is possible to restore a file to the contents of anyprevious version snapshot. In some cases, a data structure such as a bitvector (e.g., as is often used to implement a Bloom filter) can be usedto determine if a path is either (1) definitely not in the set of pathsin the index, or (2) that the path may be in the set. Lookups of pathsfrom an index can be implemented using any form of lookup, including‘hash’ lookups using independent and uniformly distributed hashingfunctions, possibly in conjunction with a Bloom filter.

Restore Procedure

When the end user needs to restore a version snapshot, or for any reasondesires to survey previous versions of a given file, a command comesinto the filer (e.g., from a user interface). A lookup of the index isperformed, and the results are organized for display to the requestinguser. The aforementioned display enables a user to make choices as towhich version snapshot(s) to view and/or which version snapshot(s)should be the object of some action or operation such as a restoreoperation. A restored version snapshot can be mounted from within thefiler virtual machine, and the virtual disk corresponding to theuser-selected version snapshot can be restored. In some cases, thevirtual disk corresponding to the user-selected version snapshot can berestored through a background process and attached to the filer VM whenthe background restore is deemed to have been completed.

Various embodiments are described herein with reference to the figures.It should be noted that the figures are not necessarily drawn to scaleand that elements of similar structures or functions are sometimesrepresented by like reference characters throughout the figures. Itshould also be noted that the figures are only intended to facilitatethe description of the disclosed embodiments—they are not representativeof an exhaustive treatment of all possible embodiments, and they are notintended to impute any limitation as to the scope of the claims. Inaddition, an illustrated embodiment need not portray all aspects oradvantages of usage in any particular environment. An aspect or anadvantage described in conjunction with a particular embodiment is notnecessarily limited to that embodiment and can be practiced in any otherembodiments even if not so illustrated. Also, references throughout thisspecification to “some embodiments” or “other embodiments” refers to aparticular feature, structure, material or characteristic described inconnection with the embodiments as being included in at least oneembodiment. Thus, the appearance of the phrases “in some embodiments” or“in other embodiments” in various places throughout this specificationare not necessarily referring to the same embodiment or embodiments.

Definitions

Some of the terms used in this description are defined below for easyreference. The presented terms and their respective definitions are notrigidly restricted to these definitions—a term may be further defined bythe term's use within this disclosure. The term “exemplary” is usedherein to mean serving as an example, instance, or illustration. Anyaspect or design described herein as “exemplary” is not necessarily tobe construed as preferred or advantageous over other aspects or designs.Rather, use of the word exemplary is intended to present concepts in aconcrete fashion. As used in this application and the appended claims,the term “or” is intended to mean an inclusive “or” rather than anexclusive “or”. That is, unless specified otherwise, or is clear fromthe context, “X employs A or B” is intended to mean any of the naturalinclusive permutations. That is, if X employs A, X employs B, or Xemploys both A and B, then “X employs A or B” is satisfied under any ofthe foregoing instances. As used herein, at least one of A or B means atleast one of A, or at least one of B, or at least one of both A and B.In other words, this phrase is disjunctive. The articles “a” and “an” asused in this application and the appended claims should generally beconstrued to mean “one or more” unless specified otherwise or is clearfrom the context to be directed to a singular form.

Reference is now made in detail to certain embodiments. The disclosedembodiments are not intended to be limiting of the claims.

DESCRIPTIONS OF EXEMPLARY EMBODIMENTS

FIG. 1A1 depicts an environment including a filer. As shown, a clientfacility might host any number of client machines (e.g., laptops,deskside machines, servers, etc.), any of which can access files using anetwork-attached storage (NAS) subsystem such as the shown NAS filer.The NAS filer manages back-end storage (e.g., see the depicted storagedevices), and may store any number of files and/or any number of fileversions. A client can access the NAS filer over a network (e.g., anEthernet network or other network). A client file access request caninclude, for example, the name of a file and/or its version. The filerfront end receives the file access request, then performs a mapping frominformation in the file access request to information pertaining astarting physical location or locations of a respective file.

A physical location can be described by a drive or unit ID, an extent orvolume name, a logical block identifier, etc.). When the physicallocation is resolved to sufficient specificity to perform a deviceaccess, the filer front end sends a request to one or more of thestorage devices, and receives a response. A response can often be thecontents of one or more blocks of data. In some cases a response can beformatted as a series of transmissions that comprise the contents of aseries of blocks of data that constitute the named file. Such a filerparadigm serves the purpose of storing and retrieving named files over anetwork, however such a filer paradigm merely serves file data a verylimited set of information (e.g., creation date, owner, version, etc.)pertaining to the file. In scenarios where a client machine isimplemented as a virtual machine, it is expeditious that the filerparadigm be extended to include any aspect of the virtual machine whensaving a state snapshot. For example, rather than merely storing andmanaging files, the entire contents of a virtual machine includingmemory state, peripheral state, execution state, files, etc. can bestored and managed. In scenarios where client machines are implementedas virtual machines, it may be convenient to implement any of theforegoing functions in a filer virtual machine (filer VM). In somecases, a filer VM is deployed on one node among a group of nodes, any ofwhich can host any sort of a virtual machine. FIG. 1A2 depicts oneexample of such an environment.

FIG. 1A2 depicts a multi-node computing environment 1A200 including afiler virtual machine. As shown, a plurality of user VMs (e.g., userVM1, user VM2, user VM3, user VM4, user VMS, user VMN, and the filer VM110) are deployed over a communication fabric such that any VM can atleast potentially communicate with any other VM. Moreover, and as shown,the filer VM 110 receives and processes any and/or all forms of storageI/O as may be raised by any of the user VMs. Furthermore, the filer VMcan access (e.g., using direct access, or using an agent support, orusing hypervisor support) any state of any user VM.

Having such access to the entirety of the VMs facilitates the filer VMto generate versions of entire VMs at any moment in time (see FIG. 1A3).In some situations a filer VM can be configured to take periodicsnapshots of one or more VMs, and can store such snapshots in thestorage devices. Versioned VM images can be stored for subsequent access(e.g., see VM1V1, VM1V2, VM2V1, VM2V2, VM1V3, etc.). A subsequent accesscan include a retrieval of all or portions of the contents of a virtualmachine at a particular time of a snapshot, including memory state atthat particular time, peripheral state at that particular time,execution state at that particular time, file state at that particulartime, etc. All or portions of the contents of a virtual machine can besaved using a filer VM. The filer VM can save (e.g., to persistentstorage on any one or more of the storage devices) any sorts of avariety of virtual machine objects, virtual disks, and versions thereto(e.g., see file F1V1, F1V2, etc.). Such a virtual machine object can bestored in an association with a name or other identifier, possiblyreferring to the virtual machine and possibly referring to a time orversion sub-identifier. As examples, a virtual machine object forvirtual machine VM1 is saved for each of the several versions shown as“VM1V1”, VM1V2″, and “VM1V3”. Also shown are virtual machine objects forVM2, namely “VM2V1” and “VM2V2”. Any virtual machine can subsume one ormore virtual disks (e.g., comprising files), and such a virtual disk103, including its contents and its metadata, can be indexed to form avirtual machine snapshot index (see index 101) and stored persistentlyin storage devices.

The filer VM can create and maintain an index 101 that stores a widerange of attributes and their values as were captured at the time aparticular snapshot was taken. As such, one or more virtual machineobjects can be identified and retrieved through a query or request madeto the index.

A request can be made to such a filer VM to retrieve any aspect of asnapshotted VM, including its memory state at a particular time, itsperipheral state at a particular time, its execution state at aparticular time, its file state at a particular time, etc., some ofwhich states are shown as virtual machine state attributes (See FIG.1B2). Such a request can be formulated as a query having an arbitrarilylarge number of query terms and values that are used to look up (e.g.,using the index 101) any one or more snapshots that satisfy the query.

FIG. 1A3 depicts an environment 1A300 where virtual machine objectversion control is provided. As an option, one or more variations ofenvironment 1A300 or any aspect thereof may be implemented in thecontext of the architecture and functionality of the embodimentsdescribed herein.

An instance of the heretofore introduced virtual machine (VM) refers toa specific software-based implementation of a machine in avirtualization environment. In exemplary embodiments of a VM, hardwareresources of a real computer (e.g., CPU, memory, etc.) are virtualizedor otherwise transformed to provide managed access to an operatingsystem (e.g., Linux, Unix, etc.) and to physical resources (e.g., CPUphysical resources 104, RAM memory resources 106, NVRAM memory resources107, instances of local peripheral resources 114, etc.).

Hypervisor

A virtual machine is often implemented as one or two layers of softwarethat runs on top of real hardware. In such a two-layer implementation,applications can run in a user virtual machine (e.g., UVM 102 ₁, UVM 102_(N)) or in a control virtual machine (e.g., filer VM 110) that runs ina layer on top of the hypervisor. The hypervisor defines or has accessto virtual representations of physical resources. Strictly as examples,the hypervisor can maintain data structures that comprise currentrepresentations of open files and/or open file descriptors, pagingregister entries, allocated memory descriptors, CPU shadow registers,lists of open files, etc. The aforementioned representations can beaccessed and/or traversed so as to identify contents. For example, afile descriptor can be used to access and/or traverse the blocks thatcomprise the file, the allocated memory descriptors can be accessedand/or traversed so as to identify contents of allocated memory, etc.

A hypervisor 130 implements a thin layer of software directly on thecomputer hardware or on a host operating system. This hypervisor layerof software contains a virtual machine monitor that allocates hardwareresources dynamically and transparently. Multiple operating systems runconcurrently on a single physical computer and share hardware resourceswith each other. By encapsulating an entire machine, including the CPU,memory, operating system, and network devices, a virtual machine becomecompatible with underlying operating systems, its hosted applications,and its device drivers. Most modern implementations allow severaloperating systems and applications to safely run at the same time on asingle computer, with each having access to the resources it needs whenit needs them.

Filer Virtual Machine

A filer virtual machine can be implemented on a node, and any number ofnodes can be instanced into a computing mesh (e.g., comprising racks,blades, data processors, cores, etc.). As shown, the filer virtualmachine interfaces with one or more storage devices (e.g., S₁ 122, S₂120, S₃ 128, S_(N) 129, etc.). Access to such storage devices can befacilitated by using a storage access layer (e.g., an IO control layer).In this embodiment, the storage devices can be heterogeneous, possiblyincluding instances of block storage, file storage, and possiblyincluding instances of storage media to implement a high-performance andhigh-reliability multi-tier storage architecture.

The shown filer virtual machine implements user storage volumes asvirtual disks (e.g., VD₁ 119 ₁, VD₂ 119 ₂, VD₃ 119 ₃, VD_(N) 119 _(N),etc.). Such virtual disks, as shown, are representations of virtualstorage volumes as may be allocated by any of the several virtualmachines. Portions of the virtual storage volumes can be held (e.g., asmetadata or as cached data) in locations that are locally accessible bythe filer VM. Contents of such virtual disks can be stored in a storagepool 116, which storage pool can comprise networked storage 121 (e.g.,network attached storage (NAS) or a storage area network (SAN), etc.),and/or any number of instances of node-local hard disk drives (e.g.,local storage 118 _(HDD)) and/or any number of node-local solid statestorage drives (e.g., local storage 118 _(SSD)). Such virtual disksmaintain volume indexes comprising access points (e.g., volumes, paths,logical block identifiers, etc.) to user data that can be persisted tothe shown storage devices. In some cases, and as shown, a filer daemon134 maintains volume indexes (e.g., volume index 124 ₁, volume index 124₂, volume index 124 ₃, volume index 124 _(N), etc.). A volume index ismaintained for each virtual disk (see FIG. 1B1 and FIG. 1B2). A volumeindex (see FIG. 1C) comprises characteristics of files stored in thevirtual disk. The filer daemon can run autonomously within a filervirtual machine, and can make a snapshot copy of a volume index at anymoment in time, possibly under direction by one or more agents (e.g.,agent 117 ₁, agent 117 ₂), which can be accessed locally (e.g., bysubroutine or method calls) or from outside the filer virtual machinefrom a remote process (e.g., via a remote process call interface such asthe shown RPC I/O 108).

FIG. 1A4 depicts an environment 1A400 where virtual machine objectversion control is provided for two or more systems. As an option, oneor more variations of environment 1A400 or any aspect thereof may beimplemented in the context of the architecture and functionality of theembodiments described herein.

The environment 1A400 depicts multiple systems (e.g., first system, andsecond system, etc.), each of which is individually addressable by auniquely-assigned IP address. As shown, the first system (#1) has an “IP#1 Address”, the second system has an “IP #2 Address”, and so on. Anyvirtual machine within its respective system (e.g., VM₁₁, VM₁₂, etc.)can be addressed individually by a portion of the IP address. Further,any system can comprise any number of virtual disks (e.g., VD₁₁, VD₁₂,etc.) that are accessible by any virtual machine in its respectivesystem. As shown, two systems (e.g., first system, second system, etc.),each access the same file F1. The blocks that comprise file F1 arestored as persistent data in storage devices S1, S2, etc.). Accesses toshared file F1 by the first system may be different from accesses to theshared file F1 by the second system. An administrator might want to knowthe entire and up-to-date status of file F1, and thus, would want toaccess an index (e.g., an index of a snapshot) that includes aspects offile F1 as may have been generated by either the first system, or thesecond system, or both.

A filer virtual machine can access physical storage of the data in anyvirtual disk. Such access is facilitated by the shown filer VM 110,which in turn accesses the storage pool to store persistent data tostorage devices S₁, S₂, etc. In this example, filer VM 110 is situatedon a third system; however, the filer VM 110 can be situated on anysystem that can access files the storage pool.

In this architecture, the filer virtual machine can generate snapshotsof files, where the snapshots (e.g., changes to the underlying file)derive from activities that originate from two or more systems.Moreover, the filer virtual machine can generate volume indexes forgroups of virtual disks or snapshots of virtual disks such that accessto the underlying (e.g., indexed) information (e.g., information fromthe multiple systems) is stored in the volume index. An index soconstructed facilitates fast access to the underlying data (e.g., fileinformation) such that a system administrator can quickly identify afile or snapshot based on a lookup using portions of system information.

As shown, the embodiment of FIG. 1A4 depicts a filer virtual machinethat is configured to process any number of transactions from any numberof systems over any number of virtual disks. The virtual disks implementa data structure in the form of a hierarchy or tree composed of nodes,the nodes being files or directories, or directories comprising furtherfiles or directories. Any node can be accessed uniquely by a path. Anexample of a virtual disk hierarchical path layout is given in thefollowing FIG. 1B1 and FIG. 1B2.

FIG. 1B1 and FIG. 1B2 present depictions of an index that is embodiedusing a virtual disk hierarchical path layout used in systems thatprovide virtual machine object version control in addition to nativeoperating system file system functions. As an option, one or morevariations of virtual disk hierarchical path layout or any aspectthereof may be implemented in the context of the architecture andfunctionality of the embodiments described herein. Also, the virtualdisk hierarchical path layout or any aspect thereof may be implementedin any environment.

The virtual disk 1190 stores a tree-oriented data structure that can beused to codify hierarchical paths through nodes (e.g., folders,directories, files, links, etc.) through the tree-oriented structure toroot nodes. Further, the shown virtual disk hierarchical path layoutincludes a list of snapshot sequences for files. Strictly as examples,the file depicted as “F1” has an association to a list of snapshotsequences, namely, and as shown, the file F1 is associated with anopen-ended snapshot sequence for F1 that is codified as “{S0F1T0,S1F1T1, . . . }”. The aforementioned codification refers to snapshot“S0” for file “F1”, at time=T0, thus “S0F1T0”. Also, the aforementionedcodification refers to snapshot “S1” for file “F1”, at time=T1, thus“S0F1T1”. As such any file can have any number of snapshots that arenamed uniquely. Using the unique name, the set of snapshots pertainingto a particular file can be deciphered from the list, and can bepresented to a user (e.g., for selection in response to a user'srestoration request).

Codification of a list of snapshot sequences for files can includeindications that a file has been deleted or marked for deletion. As anexample, FIG. 1B indicates a codification sequence for file F2 thatshows an initial snapshot “S0F2T0” and then a deletion of file F2. Inthis example, the codification of the list of snapshot sequences forfile F2 shows an associated snapshot, namely “S0F2T0”, before beingdeleted. The path to the file F2 can remain in the virtual diskhierarchical path layout indefinitely, even though the file has beendeleted or marked for deletion. A newly-opened file F2 can berepresented in the virtual disk hierarchical path layout, even though,at an earlier time, there was an earlier designation of file F2. As anexample, the newly-opened file F2 can be represented in the virtual diskhierarchical path layout using a unique identifier (e.g., using theinode number, which is unique for each file).

The virtual disk hierarchical path layout can further codify thesituation where a file is unchanged from its initial snapshot. In thissituation, a file is created (e.g., F3), a snapshot is taken at timet=T₀ (e.g., “S0F3T0”), and after time t=T₀, there are no more changes tothe file, even through to a later time t=T_(N).

In some cases, strictly as an example, the hierarchical path layout canbe organized from a root node (e.g., root label=“/root”) and children ofthe root node labeled as pertaining to users (e.g., first levellabel=“User1”, “User2”, “UserN”, etc.). The children of the root nodecan represent folders or directories (e.g., “U1Directory1”,“U1Directory2”, etc.), and child nodes of a folder or directory canrepresent further folders or directories and/or files (e.g., “F1”, “F2,etc.). By concatenating node names from a root node to a subject node, aunique path name (e.g., path 132) can be generated, and such a uniquename concatenation can be used to refer to a particular node. As shown,an example path can be given as “/root/User1/U1Directory1/F1”.

As depicted in FIG. 1B2, an index 101 can comprise any of a set ofvirtual machine attribute values including any virtual machine states(see the shown virtual machine state attributes 191), which states canbe codified into a volume index data structure 125, and which states, bytheir names and/or state values, can be retrieved through a query.Strictly as examples, a query can be formed to determine if theallocated memory size of any of the virtual machines in the index hasexceeded (for example) a particular threshold value (e.g., 14.5 GB).Additional examples are provided as shown and discussed as pertaining tothe discussion of the management interface (e.g., see FIG. 6).

All or portions of the virtual disk 119 ₀ (see FIG. 1B1) or virtual disk119 ₁ (see FIG. 1B2) are laid out to contain metadata (e.g., conformingto a volume index data structure). One embodiment of such metadata asused in an index 101 is shown and described as pertaining to FIG. 1C.

FIG. 1C depicts a metadata layout to implement a volume index 1C00 asused in systems that provide virtual machine object version control inaddition to native operating system file system functions. As an option,one or more variations of volume index 1C00 or any aspect thereof may beimplemented in the context of the architecture and functionality of theembodiments described herein. Also, the volume index 1C00 or any aspectthereof may be implemented in any environment.

As shown, the volume index data structure 125 comprises data items in ametadata layout that corresponds to each entry in the volume. The rowsof the volume index correspond to an object, and the leftmost columncodifies the path to that object (e.g., see object path 190).

In addition to the object path, each row can include several sets ofattributes. As shown, a first set of attributes originates from orderives from a first system (e.g., system #1), and a second set ofattributes originates from or derives from a second system (e.g., system#2). Strictly as examples, information from the first system and/or thesecond system can comprise IP addresses, a timestamp from each of thefirst and/or second system, an information tags pertaining to the firstand/or second system, or any derivations or combinations thereof.

Each row can include sets of attributes that pertain to a virtualmachine state.

Such attributes and their state values can.be stored in the index asvirtual machine state attribute metadata 192. As shown, a first set ofattributes (e.g., VM1A1, VM1A2) originate from or derive from a firstvirtual machine (e.g., VM1), and a second set of attributes (e.g.,VM2A1, VM2A2) originate from or derive from a second virtual machine(e.g., VM2).

In addition to the aforementioned metadata, the volume index datastructure 125 includes any number of cells that capture a list ofblocks. The list of blocks can be codified directly in the cell (e.g.,by direct codification of a list of blocks) or indirectly by referenceto another location such as an inode, which in turn codifies a list ofblocks. The number of columns can increase over time so as to capturethe occurrence of a version snapshot. As shown, the columns to the rightof the file attributes capture aspects of the occurrence of a snapshotat three different times, namely at time T=T0 (e.g., the initial blockset), time T=T1 (e.g., the block set comprising the file at time T1),and time T=TN (e.g., the block set comprising the file at time TN).

Using the aforementioned techniques and data structures, a versionsnapshot data structure can be generated as follows:

-   -   1. Receive a snapshot signal that initiates a snapshotting        operation,    -   2. Generate a version snapshot list (e.g., a list of block        identifiers that comprise a version snapshot file),    -   3. Generate an entry into a version index, and    -   4. Store the version snapshot list in a persistent storage        location.

In some deployments, changes to a virtual machine object can occurfrequently.

For example, a virtual machine object may change in terms of itscontents many times during the life of the virtual machine object, andmoreover, a virtual machine object may be subject to operations that addone or more blocks to the virtual machine object (e.g., where thevirtual machine object size increases) and/or a virtual machine objectmay be subject to operations that remove one or more blocks from thevirtual machine object (e.g., where the virtual machine object sizedecreases). Such functions (e.g., operations to “add blocks” andoperations to “delete blocks”) can be used repeatedly over the lifetimeof a virtual machine object. A virtual machine object can be stored as afile in a storage pool, and can be indexed in a virtual machine index. Aset of virtual machine index maintenance techniques are shown anddiscussed as pertains to the following figures.

FIG. 1D depicts a volume index generation technique 1D00 as used insystems that provide virtual machine object version control. As shown, aprocess flow commences upon receiving (e.g., from a filer virtualmachine), a signal that initiates sending at least one quiescencerequest to at least one user virtual machine (see step 162). The filervirtual machine receives an acknowledgement signal from the least oneuser virtual machine, which acknowledgement signal indicates an opentime window during which a snapshot can be taken (see step 163). Uponquiescence, the filer virtual machine requests a set of virtual machineattribute values from the least one user virtual machine (see step 164).At least some of the set of virtual machine attribute values are used togenerate a volume index data structure comprising at least some of thevirtual machine attribute values (see step 165). The index is stored toa persistent storage location (see step 166).

Some of the foregoing steps can be facilitated by one or more agents,aspects of which are discussed hereunder.

FIG. 2 depicts volume index maintenance techniques 200 as used insystems that provide virtual machine object version control in additionto native operating system file system functions. As an option, one ormore variations of volume index maintenance techniques 200 or any aspectthereof may be implemented in the context of the architecture andfunctionality of the embodiments described herein. Also, the volumeindex maintenance techniques 200 or any aspect thereof may beimplemented in any environment.

As previously indicated, a file may be subject to ongoing operationsthat affect the block content (e.g., an update operation), and/oroperations that increase the size of the file (e.g., an add operation),and/or operations that decrease the size of the file (e.g., a deleteoperation). Such operations can be undertaken or initiated by acomputational element such as an agent 117. In particular, operations to“add blocks”, operations to “delete blocks”, and operations to “updateblocks” can be implemented as an add function 202, a delete function204, and an update function 206. Strictly as one example, an addfunction 202 can be defined to codify a list of blocks to be added tothe makeup of the file, and the add function can then formulate arequest 210 to result in a modification of the volume index 124 ₀ torecord the addition of the added blocks.

To facilitate fast access of, and modification of, the volume index, anaccess function 216 can be implemented so as to accept a request 210,perform the access (e.g., a READ-MODIFY-WRITE access), and return aresponse 212 to the requestor. Access operations can include any of theaforementioned READ-MODIFY-WRITE accesses (e.g., to make changes to therow pertaining to a file in the index) or an access operation can add arow to the volume index. To facilitate fast lookup operations performedby the access function 216, some embodiments include a hashing block 218that facilitates lookup of a file (e.g., by its path) in the volumeindex. In some cases, a Bloom filter is used to determine access pathswithin the volume index. Lookups of paths from an index can beimplemented using any forms of lookup (e.g., with or without a Bloomfilter).

Certain implementations provide additional functions so as to facilitatetaking (e.g., storing) snapshots that form consistent units. Morespecifically, some implementation of volume index maintenance techniques200 include operations to send the writers of subject files a snapshotsignal that carries the semantics of “complete operations and wait”.Such semantics are implemented in part by the shown quiesce function208. As an example, the quiesce function 208 can interact with filereader-writers (see FIG. 3A and FIG. 3B) to achieve quiescence, and uponachieving quiescence, a consistent snapshot can be taken. In exemplaryembodiments, a snapshot of all files identified within the volume indexare taken by persisting a snapshot copy of the volume index (e.g., seepersisted volume index snapshot copy 224). In some cases, the agent 117sends a command 214 to a serializer 222 to serialize the volume indexfor communication over a network and storage in a non-volatile location.As shown, the serializer 222 has a companion deserializer 220 that canbe used in restore operations (e.g. see FIG. 7).

The aforementioned operations to achieve quiescence can be facilitatedby the agent through a messaging protocol with file reader-writers. Someexamples of such a messaging protocol are given in the following FIG. 3Aand FIG. 3B.

FIG. 3A depicts a snapshot version management technique 3A00 as used insystems that provide virtual machine object version control in additionto native operating system file system functions. As an option, one ormore variations of snapshot version management techniques 3A00 or anyaspect thereof may be implemented in the context of the architecture andfunctionality of the embodiments described herein. Also, the snapshotversion management techniques 3A00 or any aspect thereof may beimplemented in any environment.

The embodiment shown in FIG. 3A includes a management interface 310 ₁,which is composed of a snapshot scheduling interface 312 and a snapshotrestore interface 314. Interaction (e.g., user or agent interaction)with the snapshot scheduling interface 312 and/or the snapshot restoreinterface 314 may invoke activities to be performed by a snapshotcommencement engine 316 and/or a restore commencement engine 318. Insome cases, activities performed by a snapshot commencement engine 316and/or a restore commencement engine include communications with one ormore agents (e.g., agent 117 ₁, agent 117 ₂), which may be (as shown)subsumed within a filer virtual machine (e.g., filer virtual machine).

In one embodiment, a user interacts with the management interface tospecify a snapshot schedule, which may include a particular time toinitiate activities pertaining to taking a snapshot, or may include auser request to take a snapshot at that moment in time. Taking a versionsnapshot that is consistent within itself may require determining thatall reader-writers are quiescent, at least during the time period thatit takes to write out the blocks pertaining to that particular snapshot.As shown, an agent sends messages or other form of a signals (e.g., oneor more instances of quiescence request signal 320) to reader-writers,and waits for acknowledgements (e.g., one or more instances ofquiescence acknowledgement signal 322) to come back from thereader-writers. This protocol gives the reader-writers a chance to bringtheir processing to a quiescent state, at least to the point that thereader-writer deems its writes to the files considered in the snapshotset is in an application-wise consistent state. It is possible that thereader-writers can continue processing, so long as the processing ortransactions do not result in writes to the files considered in thesnapshot.

The agent may retrieve (e.g., from a hypervisor) and/or consult with atable (e.g., table T1) of user processes, daemons, or otherreader-writers of the files considered in the snapshot. The actionsinvolved in making determinations as to which blocks to consider in thesnapshot are delayed by the agent until such time that all knownreader-writers have sent acknowledgements (e.g., one or more instancesof quiescence acknowledgement signal 322) to the agent. The agent candelay, and periodically check that all outgoing quiescence requests havebeen acknowledged. When the agent has determined that all outgoingquiescence requests have been acknowledged, the agent can continueprocessing. In some cases, the agent assigns a task master to coordinateactivities of taking a snapshot, and the task master can in turn assignworkloads to task slaves (e.g., see the embodiment of FIG. 4).

When the activities pertaining to persisting the version snapshot haveconcluded, then the agent can release the reader-writers from theirquiescent state. This can be accomplished for example, by acts of theagent, specifically by sending a quiescence release signal (e.g.,quiescence release signal 323) to the reader-writers.

In situations pertaining to a restore, a user may initiate a restoreoperation, and may select a particular version snapshot from which toprocess the restore. A user interface (e.g., see FIG. 6) may be providedto assist the user in making such a selection. When the user initiatesthe restore operation from such a user interface, the shown restorecommencement engine will communicate with an agent (e.g., agent 117 ₂),which will in turn cause the agent to establish a sufficiently quiescentenvironment in which the process the restore operations can begin. Insome cases, the same protocol used in establishing quiescence for thesnapshot commencement can be carried out prior to initiating file IOactivities pertaining to the restore activity. In some situations theagent may retrieve (e.g., from a hypervisor) and/or consult with a table(e.g., table T2) of user processes, daemons, or other reader-writers ofthe files considered in the snapshot being restored. The actionsinvolved in restoration of the version snapshot can be delayed by theagent until such time as the environment is sufficiently quiescent so asto begin the restoration file IO operations.

In some environments, the agent within the virtual filer machine maycommunicate with an external agent (e.g., an agent that operates withinan execution context other than within the virtual filer machine). Sucha situation is shown and described as pertains to FIG. 3B.

FIG. 3B depicts a snapshot version management technique 3B00 using anexternal service to provide virtual machine object version control inaddition to native operating system file system functions. As an option,one or more variations of the snapshot version management technique 3B00or any aspect thereof may be implemented in the context of thearchitecture and functionality of the embodiments described herein.

The embodiment shown in FIG. 3B includes an external agent 317. Theexternal agent can receive messages or other form of a signals (e.g.,one or more instances of quiescence request signal 320) and responsewith acknowledgements (e.g., one or more instances of quiescenceacknowledgement signal 322). In this example, the external agent canhandle aspects of determining various states of quiescence, possiblyincluding managing processing within the overall environment so as tobring the overall environment to a known state before taking a versionsnapshot. When the activities pertaining to taking a version snapshothave concluded, then the agent can advise the external agent to releasethe reader-writers from their quiescent state. This advice can be actedupon by acts of the external agent, specifically for example, byreleasing the reader-writers under its purview.

FIG. 4 depicts messages and operations in a protocol 400 as used insystems that provide virtual machine object version control in additionto native operating system file system functions. As an option, one ormore variations of protocol 400 or any aspect thereof may be implementedin the context of the architecture and functionality of the embodimentsdescribed herein. Also, the protocol 400 or any aspect thereof may beimplemented in any environment.

As aforementioned, an agent can assign a task master 402 to coordinateactivities of taking a snapshot, and the task master can in turn assignworkloads to task slaves (e.g., one or more instances of task slave404). Coordination between a user interaction to define a versionsnapshot schedule and conclusion of slave tasks that account for takinga version snapshot can be carried out using a protocol 400. In theportion of protocol 400, as shown a user interacts with a managementinterface 310 ₂, which in turn invokes activities that conclude with thetaking a version snapshot. In this example, the user defines a schedule(e.g., see operation 412), which schedule is sent (e.g., see message410) along with an initiation signal to a task master 402, which taskmaster may wait (see operation 414) until an appointed time (e.g., asmay be codified into the schedule). The task master may initiate actionsto bring the system to a state of quiescence. For example, and as shown,the task master may send a quiesce command (see message 416 ₁) to agent117 ₃. The agent can in turn parse the command (see step 418) andproceed to carry out all of, or portions of, the protocol discussed inFIG. 3A and FIG. 3B, and may wait for quiescence (see operation 420)before associating an index with a version snapshot (see operation 422)then dumping the index of the subject virtual disk (see operation 424).In some situations, the act of taking a version snapshot may includedumping the index of multiple subject virtual disks. Accordingly, theagent may loop (see loop 426) to perform multiple associations and toperform multiple dump operations. When loop iterations are complete, theagent sends a completion indication (see message 428 ₁) to the mastertask, and possibly also may send a completion indication (see message428 ₂) to the management interface.

In some situations the master task performs additional steps during theprocessing of a version snapshot. The shown slave-assisted snapshotprocessing 440 is invoked when a task master sends a quiesce command(see message 416 ₂) to an agent. The agent may merely recognize that themaster task is to carry out slave-assisted snapshot processing, and maymerely wait for some notification of completion, or may take no actionat all. The task master may assign workloads to slave tasks (see message430 ₁ and message 430 ₂). In some cases, the task master is responsibleto invoke as many slaves as may be deemed to participate in the set ofslave operations (e.g., see slave operations 432). The slaves mayindividually and/or collectively perform multiple associations and mayperform multiple dump operations. When slave operations pertaining tothe slave-assisted snapshot processing is deemed to have been completed,the task master sends a completion indication (see message 428 ₃).

FIG. 5 depicts an example inode referencing technique 500 as used involume index layouts that facilitate virtual machine object versioncontrol in addition to native operating system file system functions. Asan option, one or more variations of inode referencing technique 500 orany aspect thereof may be implemented in the context of the architectureand functionality of the embodiments described herein. Also, the inodereferencing technique 500 or any aspect thereof may be implemented inany environment.

The embodiment shown in FIG. 5 is merely one example to show that avolume index (e.g., volume index 124 ₁, volume index 124 ₂, etc.) cancodify the existence of a version snapshot using a pointer or otherreference to one or more inodes (e.g., inode 510 ₀, inode 510 ₁, inode510 _(N), etc.). As shown an inode can be formed at any moment in time(e.g., see time=T0, time=T1, time=TN), and an inode comprisesinformation pertaining to a version snapshot. Information pertaining toa version snapshot comprises a list of blocks. As shown, inode 510 ₀includes a list of blocks given as the ordered listing of “OXABCD”,“OXABCE”, . . . and OXFAAA. At another time (e.g., time=T1) anotherinode can be generated, and may comprise a different list of blocks.

Given a volume index, the occurrence of a file within the volume indexcan be determined using a file path identification module. Such a filepath identification module (e.g., file path identification module 502)can search through the volume index to determine the existence of therequested file, or, a file path identification module can usesearching/sorting/filtering techniques, possibly including a Bloomfilter so as to make determinations and/or identifications of theexistence of a particular file or path in a particular volume indexwhile incurring only a small amount of processing, and while incurringonly a small amount of latency.

FIG. 6 depicts an example snapshot version management user interface 600as used in systems that facilitate virtual machine object versioncontrol in addition to native operating system file system functions. Asan option, one or more variations of snapshot version management userinterface 600 or any aspect thereof may be implemented in the context ofthe architecture and functionality of the embodiments described herein.Also, the snapshot version management user interface 600 or any aspectthereof may be implemented in any environment.

The graphical user interface (e.g., GUI 602) shown in FIG. 6 can bepresented to a user such that the user can review and select previouslycaptured version snapshots. As shown, the management interface providesa set of screen devices (e.g., text boxes, pull-down menus, checkboxselectors, etc.) that facilitate user specification of a query, whichquery is in turn executed over a set of records pertaining topreviously-captured version snapshots. In some situations, a user mightdefine a date range or other limit (see “Enter Date Range or Limit”).The date range can be formatted as a “before date” specification, or asan “after date” specification, or as an expression pertaining to alimit, such as “<3 days old”. In some cases, and as shown, a grouping ofmenu items can be displayed so as to facilitate user construction of alogical expression to be used in a query (see “Date Range or LimitBoolean”).

A result set is presented (e.g., before, during or after construction ofthe query), and such a result set 604 comprises information pertainingto the individual version snapshots that are returned in response to thequery. A scroll bar 606 is provided. A specific version snapshot can beselected or otherwise identified as the version snapshot to be restored(see the “Select” screen device 608).

The shown GUI 602 includes screen devices to facilitate userspecification of commands (e.g., restore command) or other actions to betaken and/or to be processed using the selected version snapshot.Strictly as examples, the commands, actions and corresponding operationsthereto might include aspects of mounting the version snapshot (e.g.,indicating user intent to restore from the selected version snapshot)and unmounting the version snapshot when the restore has completed.

Any of the user specifications and/or selections pertaining to userentries using GUI 602 can be communicated to the filer virtual machine.If a restore request is sufficiently specified (e.g., through use of GUI602) then the filer virtual machine will initiate a restore procedure. Aseries of possible restore procedure actions is shown and discussed aspertains to FIG. 7.

FIG. 7 depicts an example version snapshot restore procedure 700. As anoption, one or more variations of version snapshot restore procedure 700or any aspect thereof may be implemented in the context of thearchitecture and functionality of the embodiments described herein.Also, the version snapshot restore procedure 700 or any aspect thereofmay be implemented in any environment.

The initial portions of the shown restore procedure include interactionwith a user through an instance of the shown snapshot version managementuser interface 600. Specifically, a recipient process (e.g., a filervirtual machine) receives instructions, possibly in the form of aversion specification or a query (see step 702). The recipient processreturns results to the requestor. As shown, the recipient processexecutes the query and formats the results (e.g., see step 704). In somecases, a user might specify a particular set of mount and/or unmountoptions. In such cases, the recipient process formats the mount/unmountoptions for presentation to a restore process (see step 706), whichmight be initiated as a background process (see step 708). The restoreprocess might take a short amount of real time to perform the restore,or the restore process might take a long amount of real time to performthe restore. In either case, the recipient process monitors the restoreprocess (see step 710) through termination (e.g., through successfulcompletion, or through a timeout or error condition, or termination forreasons other than successful completion). When the restore processterminates, the recipient process advises the user of the status (seestep 712). In exemplary cases, the user is advised that the restore hasbeen applied successfully, and that the restored portions of the filesystem is ready.

ADDITIONAL EMBODIMENTS OF THE DISCLOSURE Additional PracticalApplication Examples

FIG. 8A depicts a system 8A00 as an arrangement of computing modulesthat are interconnected so as to operate cooperatively to implementcertain of the herein-disclosed embodiments. The partitioning of system8A00 is merely illustrative and other partitions are possible.

FIG. 8A depicts a block diagram of a system to perform certain functionsof a computer system. As an option, the present system 8A00 may beimplemented in the context of the architecture and functionality of theembodiments described herein. Of course, however, the system 8A00 or anyoperation therein may be carried out in any desired environment.

The system 8A00 comprises at least one processor and at least onememory, the memory serving to store program instructions correspondingto the operations of the system. As shown, an operation can beimplemented in whole or in part using program instructions accessible bya module. The modules are connected to a communication path 8A05, andany operation can communicate with other operations over communicationpath 8A05. The modules of the system can, individually or incombination, perform method operations within system 8A00. Anyoperations performed within system 8A00 may be performed in any orderunless as may be specified in the claims.

The shown embodiment implements a portion of a computer system,presented as system 8A00, comprising a computer processor to execute aset of program code instructions (see module 8A10) and modules foraccessing memory to hold program code instructions to perform: receivinga signal that initiates a file version snapshot operation on a subjectfile (see module 8A20); requesting a set of process identifiers from ahypervisor, where at least some of the processes corresponding torespective process identifiers have the subject file open (see module8A30); sending a message to the set of processes that have the subjectfile open to request a temporary suspension of file IO over the subjectfile (see module 8A40); receiving an acknowledgement signal from the setof processes that have the subject file open (see module 8A50);traversing a list of block identifiers that comprise the subject file togenerate a version snapshot list (see module 8A60); storing the versionsnapshot list in a storage location referenced by a file system index(see module 8A70); and sending a message to the set of processes torelease the temporary suspension (see module 8A80).

In many situations the subject file can be modified over time. Strictlyas examples, an add block operation can be performed on the subjectfile, and a new inode corresponding to the newly modified file isgenerated to add the new block identifier that refers to a new blockadded to the subject file. The previous inode can remain, and can beused to refer its corresponding version snapshot. A subject file canbecome smaller as a result of performing a delete block operation on thesubject file, where a new inode is generated so as to remove the deletedblock from the list of block identifiers that comprise the new versionsnapshot of the subject file. The index of a virtual disk can bepersisted by serializing the system index and storing it in anonvolatile storage location. Such a persisted serialized system indexcan be used in procedures that restore a particular version snapshot.Strictly as one example, a restore procedure for a particular versionsnapshot can commence upon reading from the nonvolatile storage locationto retrieve and possibly deserialize a stored file system index thatcorresponds to the particular version snapshot. Aspects of the storedfile system index can be presented in a graphical user interface, fromwhich graphical user interface a user can specify characteristics of anintended restore procedure. Such a restore procedure (e.g., using abackground restore process) can be initiated in response to auser-indicated restore request.

FIG. 8B depicts a system 8B00 as an arrangement of computing modulesthat are interconnected so as to operate cooperatively to implementcertain of the herein-disclosed embodiments. The partitioning of system8B00 is merely illustrative and other partitions are possible. As anoption, the system 8B00 may be implemented in the context of thearchitecture and functionality of the embodiments described herein. Ofcourse, however, the system 8B00 or any operation therein may be carriedout in any desired environment.

The system 8B00 comprises at least one processor and at least onememory, the memory serving to store program instructions correspondingto the operations of the system. As shown, an operation can beimplemented in whole or in part using program instructions accessible bya module. The modules are connected to a communication path 8B05, andany operation can communicate with other operations over communicationpath 8B05. The modules of the system can, individually or incombination, perform method operations within system 8B00. Anyoperations performed within system 8B00 may be performed in any orderunless as may be specified in the claims.

The shown embodiment implements a portion of a computer system,presented as system 8B00, comprising a computer processor to execute aset of program code instructions (see module 8B10) and modules foraccessing memory to hold program code instructions to perform:receiving, by a first system, a signal that initiates a file versionsnapshot operation on a subject file (see module 8B20); requesting a setof process identifiers from an operating system, wherein at least someof the processes corresponding to respective process identifiers havethe subject file open (see module 8B30); sending a message to the set ofprocesses that have the subject file open to request a temporarysuspension of file IO over the subject file (see module 8B40); receivingan acknowledgement signal from the set of processes that have thesubject file open (see module 8B50); traversing a list of blockidentifiers that comprise the subject file to generate a versionsnapshot data structure (see module 8B60); generating an index from thesnapshot data structure comprising metadata that is received from orderived from information originating from a second system (see module8B70); and storing the version snapshot list in a storage locationreferenced by a file system index (see module 8B80).

Variations include:

-   -   Variations where the information originating from the second        system comprises at least one of, a timestamp from the second        system, an IP address of the second system, an information tag        from the second system, or any combination thereof.    -   Variations that further comprise performing an add block        operation, by the second system, on the subject file using a new        block identifier that refers to a new block added to the subject        file.    -   Variations that further comprise performing a delete block        operation on the subject file by removing a deleted block from        the list of block identifiers that comprise the subject file.    -   Variations that further comprise performing an update block        operation on the subject file by changing data stored in the        subject file at the updated block.    -   Variations that further comprise serializing at least a portion        of the index and storing the serialized index in a nonvolatile        storage location.    -   Variations that further comprise receiving a request to restore        a particular version snapshot, reading from the nonvolatile        storage location to retrieve a stored index that corresponds to        the particular version snapshot, and deserializing a stored        version snapshot corresponding to the particular version        snapshot.    -   Variations that further comprise presenting aspects of the        stored index in a graphical user interface.    -   Variations that further comprise receiving a restore command in        response to presentation of the graphical user interface.    -   Variations that further comprise processing the restore command        in a background process.

Variations can include more, or can include fewer or different stepsthan as described above.

FIG. 8C depicts a system 8C00 as an arrangement of computing modulesthat are interconnected so as to operate cooperatively to implementcertain of the herein-disclosed embodiments. As an option, the system8C00 may be implemented in the context of the architecture andfunctionality of the embodiments described herein. Of course, however,the system 8C00 or any operation therein may be carried out in anydesired environment.

The system 8C00 comprises at least one processor and at least onememory, the memory serving to store program instructions correspondingto the operations of the system. As shown, an operation can beimplemented in whole or in part using program instructions accessible bya module. The modules are connected to a communication path 8C05, andany operation can communicate with other operations over communicationpath 8C05. The modules of the system can, individually or incombination, perform method operations within system 8C00. Anyoperations performed within system 8C00 may be performed in any orderunless as may be specified in the claims.

The shown embodiment implements a portion of a computer system,presented as system 8C00, comprising a computer processor to execute aset of program code instructions (see module 8C10) and modules foraccessing memory to hold program code instructions to perform:receiving, by a filer virtual machine, a snapshot signal that initiatesa quiescence request (see module 8C20); receiving an acknowledgementsignal from the least one virtual machine process (see module 8C30);requesting a set of virtual machine attribute values (see module 8C40);processing at least some of the set of virtual machine attribute valuesto generate a volume index data structure comprising at least some ofthe virtual machine attribute values (see module 8C50); and storing thevolume index data structure in a persistent storage facility (see module8C60).

SYSTEM ARCHITECTURE OVERVIEW Additional System Architecture Examples

FIG. 9A depicts an architecture 9A00 comprising a collection ofinterconnected components suitable for implementing embodiments of thepresent disclosure and/or for use in the herein-described environments.The shown virtual machine architecture 9A00 includes a virtual machineinstance in a configuration 901 that is further described as pertainingto the controller virtual machine instance 930. A controller virtualmachine instance receives block I/O (input/output or IO) storagerequests as network file system (NFS) requests in the form of NFSrequests 902, and/or internet small computer storage interface (iSCSI)block IO requests in the form of iSCSI requests 903, and/or Samba filesystem requests (SMB) in the form of SMB requests 904. The controllervirtual machine instance publishes and responds to an internet protocol(IP) address (e.g., see CVM IP address 910). Various forms of input andoutput (I/O or IO) can be handled by one or more IO control handlerfunctions (see IOCTL handler functions 908) that interface to otherfunctions such as data I/O manager functions 914 and/or metadata managerfunctions 922. As shown, the data IO manager functions can includecommunication with a virtual disk configuration manager 912 and/or caninclude direct or indirect communication with any of various block IOfunctions (e.g., NFS IO, iSCSI IO, SMB IO, etc.).

In addition to block TO functions, the configuration 901 supports TO ofany form (e.g., block IO, streaming IO, packet-based IO, HTTP traffic,etc.) through either or both of a user interface (UI) handler such as UII/O handler 940 and/or through any of a range of application programminginterfaces (APIs), possibly through the shown API I/O manager 945.

The communications link 915 can be configured to transmit (e.g., send,receive, signal, etc.) any types of communications packets comprisingany organization of data items. The data items can comprise a payloaddata area as well as a destination address (e.g., a destination IPaddress), a source address (e.g., a source IP address), and can includevarious packet processing techniques (e.g., tunneling), encodings (e.g.,encryption), and/or formatting of bit fields into fixed-length blocks orinto variable length fields used to populate the payload. In some cases,packet characteristics include a version identifier, a packet or payloadlength, a traffic class, a flow label, etc. In some cases the payloadcomprises a data structure that is encoded and/or formatted to fit intobyte or word boundaries of the packet.

In some embodiments, hard-wired circuitry may be used in place of or incombination with software instructions to implement aspects of thedisclosure. Thus, embodiments of the disclosure are not limited to anyspecific combination of hardware circuitry and/or software. Inembodiments, the term “logic” shall mean any combination of software orhardware that is used to implement all or part of the disclosure.

The term “computer readable medium” or “computer usable medium” as usedherein refers to any medium that participates in providing instructionsto a data processor for execution. Such a medium may take many formsincluding, but not limited to, non-volatile media and volatile media.Non-volatile media includes, for example, solid state storage devices(SSD) or optical or magnetic disks such as disk drives or tape drives.Volatile media includes dynamic memory such as a random access memory.As shown, the controller virtual machine instance 930 includes a contentcache manager facility 916 that accesses storage locations, possiblyincluding local DRAM (e.g., through the local memory device access block918) and/or possibly including accesses to local solid state storage(e.g., through local SSD device access block 920).

Common forms of computer readable media include any non-transitorycomputer readable medium, for example, floppy disk, flexible disk, harddisk, magnetic tape, or any other magnetic medium; CD-ROM or any otheroptical medium; punch cards, paper tape, or any other physical mediumwith patterns of holes; or any RAM, PROM, EPROM, FLASH-EPROM, or anyother memory chip or cartridge. Any data can be stored, for example, inany form of external data repository 931, which in turn can be formattedinto any one or more storage areas, and which can comprise parameterizedstorage accessible by a key (e.g., a filename, a table name, a blockaddress, an offset address, etc.). An external data repository 931 canstore any forms of data, and may comprise a storage area dedicated tostorage of metadata pertaining to the stored forms of data. In somecases, metadata can be divided into portions. Such portions and/or cachecopies can be stored in the external storage data repository and/or in alocal storage area (e.g., in local DRAM areas and/or in local SSDareas). Such local storage can be accessed using functions provided by alocal metadata storage access block 924. The external data repository931 can be configured using a CVM virtual disk controller 926, which canin turn manage any number or any configuration of virtual disks.

Execution of the sequences of instructions to practice certainembodiments of the disclosure are performed by a one or more instancesof a processing element such as a data processor or such as a centralprocessing unit (e.g., CPU1, CPU2). According to certain embodiments ofthe disclosure, two or more instances of configuration 901 can becoupled by a communications link 915 (e.g., backplane, LAN, PTSN, wiredor wireless network, etc.) and each instance may perform respectiveportions of sequences of instructions as may be required to practiceembodiments of the disclosure.

The shown computing platform 906 is interconnected to the Internet 948through one or more network interface ports (e.g., network interfaceport 923 ₁, network interface port 923 ₂, etc.). The configuration 901can be addressed through one or more network interface ports using an IPaddress. Any operational element within computing platform 906 canperform sending and receiving operations using any of a range of networkprotocols, possibly including network protocols that send and receivepackets (e.g., see network protocol packet 921 ₁ and network protocolpacket 921 ₂).

The computing platform 906 may transmit and receive messages that can becomposed of configuration data, and/or any other forms of data and/orinstructions organized into a data structure (e.g., communicationspackets). In some cases, the data structure includes program codeinstructions (e.g., application code) communicated through Internet 948and/or through any one or more instances of communications link 915.Received program code may be processed and/or executed by a CPU as it isreceived, and/or program code may be stored in any volatile ornon-volatile storage for later execution. Program code can betransmitted via an upload (e.g., an upload from an access device overthe Internet 948 to computing platform 906). Further, program codeand/or results of executing program code can be delivered to aparticular user via a download (e.g., a download from the computingplatform 906 over the Internet 948 to an access device).

The configuration 901 is merely one sample configuration. Otherconfigurations or partitions can include further data processors, and/ormultiple communications interfaces, and/or multiple storage devices,etc. within a partition. For example, a partition can bound a multi-coreprocessor (e.g., possibly including embedded or co-located memory), or apartition can bound a computing cluster having a plurality of computingelements, any of which computing elements are connected directly orindirectly to a communications link. A first partition can be configuredto communicate to a second partition. A particular first partition andparticular second partition can be congruent (e.g., in a processingelement array) or can be different (e.g., comprising disjointed sets ofcomponents).

A module as used herein can be implemented using any mix of any portionsof the system memory and any extent of hard-wired circuitry includinghard-wired circuitry embodied as a data processor. Some embodimentsinclude one or more special-purpose hardware components (e.g., powercontrol, logic, sensors, transducers, etc.). A module may include one ormore state machines and/or combinational logic used to implement orfacilitate the operational and/or performance characteristics of theembodiments disclosed herein.

Various implementations of the data repository comprise storage mediaorganized to hold a series of records or files such that individualrecords or files are accessed using a name or key (e.g., a primary keyor a combination of keys and/or query clauses). Such files or recordscan be organized into one or more data structures (e.g., data structuresused to implement or facilitate aspects of the embodiments disclosedherein. Such files or records can be brought into and/or stored involatile or non-volatile memory.

FIG. 9B depicts a containerized architecture 9B00 comprising acollection of interconnected components suitable for implementingembodiments of the present disclosure and/or for use in theherein-described environments. The shown containerized architecture 9B00includes a container instance in a configuration 951 that is furtherdescribed as pertaining to the container instance 950. The configuration951 includes a daemon (as shown) that performs addressing functions suchas providing access to external requestors via an IP address (e.g.,“P.Q.R.S”, as shown), a protocol specification (e.g., “http:”), andpossibly port specifications. The daemon can perform port forwarding tothe container. A container can be rooted in a directory system and canbe accessed by file system commands (e.g., “ls” or “ls-a”, etc.).

The container might optionally include an operating system 978, howeversuch an operating system need not be provided. Instead, a container caninclude a runnable instance 958, which is built (e.g., throughcompilation and linking, or just-in-time compilation, etc.) to includeall of the library- and OS-like functions needed for execution of therunnable instance. In some cases, a runnable instance 958 can be builtwith a virtual disk configuration manager, any of a variety of data IOmanagement functions, etc. In some cases, a runnable instance includescode for, and access to, a container virtual disk controller 976. Such acontainer virtual disk controller can perform any of the functions thatthe aforementioned CVM virtual disk controller 926, yet such a containervirtual disk controller 976 does not rely on a hypervisor or anyparticular operating system in order to perform its range of functions.

In the foregoing specification, the disclosure has been described withreference to specific embodiments thereof. It will, however, be evidentthat various modifications and changes may be made thereto withoutdeparting from the broader spirit and scope of the disclosure. Forexample, the above-described process flows are described with referenceto a particular ordering of process actions. However, the ordering ofmany of the described process actions may be changed without affectingthe scope or operation of the disclosure. The specification and drawingsare to be regarded in an illustrative sense rather than in a restrictivesense.

What is claimed is:
 1. A method, comprising: placing a virtual machineinto a quiesced state, wherein write requests are suspended for thevirtual machine while the virtual machine is in the quiesced state;identifying virtual machine attribute values and virtual machine data ofthe virtual machine in the quiesced state; storing a representation ofthe virtual machine in the quiesced state, the representation comprisingthe virtual machine attributes values and virtual machine data in a datastructure.
 2. The method of claim 1, wherein the data structure isstored in persistent storage.
 3. The method of claim 1, wherein thevirtual machine attribute values are requested from a hypervisor.
 4. Themethod of claim 1, wherein the virtual machine attribute values compriseat least one of, an allocated memory descriptor or a file descriptor. 5.The method of claim 1, wherein the virtual machine attribute valuescomprise at least one of, memory contents based on an allocated memorydescriptor, or file contents based on a file descriptor.
 6. The methodof claim 1, further comprising processing a query pertaining to the atleast some of the virtual machine attribute values.
 7. The method ofclaim 1, further comprising forming a hash table from at least a portionof the data structure.
 8. The method of claim 1, further comprisingprocessing a query pertaining to the data structure by accessing a hashtable.
 9. The method of claim 1, wherein virtual machine attributevalues comprise at least one, paging register entry, list of open files,or state variable.
 10. A non-transitory computer readable medium havingstored thereon a sequence of instructions which, when executed by aprocessor performs a set of acts comprising: placing a virtual machineinto a quiesced state, wherein write requests are suspended for thevirtual machine while the virtual machine is in the quiesced state;identifying virtual machine attribute values and virtual machine data ofthe virtual machine in the quiesced state; storing a representation ofthe virtual machine in the quiesced state, the representation comprisingthe virtual machine attributes values and virtual machine data in a datastructure.
 11. The computer readable medium of claim 10, wherein thedata structure is stored in persistent storage.
 12. The computerreadable medium of claim 10, wherein the virtual machine attributevalues are requested from a hypervisor.
 13. The computer readable mediumof claim 10, wherein the virtual machine attribute values comprise atleast one of, an allocated memory descriptor or a file descriptor. 14.The computer readable medium of claim 10, wherein the virtual machineattribute values comprise at least one of, memory contents based on anallocated memory descriptor, or file contents based on a filedescriptor.
 15. The computer readable medium of claim 10, furthercomprising processing a query pertaining to the at least some of thevirtual machine attribute values.
 16. The computer readable medium ofclaim 10, further comprising forming a hash table from at least aportion of the data structure.
 17. The computer readable medium of claim10, further comprising processing a query pertaining to the datastructure by accessing a hash table.
 18. The computer readable medium ofclaim 10, wherein virtual machine attribute values comprise at leastone, paging register entry, list of open files, or state variable.
 19. Asystem, comprising: a storage medium having stored thereon a sequence ofinstructions; and a processor that executes the sequence of instructionsto cause the processor or processors to perform a set of acts, the setof acts comprising: placing a virtual machine into a quiesced state,wherein write requests are suspended for the virtual machine while thevirtual machine is in the quiesced state; identifying virtual machineattribute values and virtual machine data of the virtual machine in thequiesced state; storing a representation of the virtual machine in thequiesced state, the representation comprising the virtual machineattributes values and virtual machine data in a data structure.
 20. Thesystem of claim 19, wherein the data structure is stored in persistentstorage.
 21. A computer readable medium, comprising: providing a userinterface for accessing a data structure comprising sets of virtualmachine attribute values and virtual machine data received from avirtual machine while the virtual machine is in a quiesced state,wherein write requests are suspended for the virtual machine while thevirtual machine is in the quiesced state performing a lookup operationon an index of the data structure in response to input received from theuser interface; and restoring a previous version of a data objectcorresponding to the lookup operation.
 22. The computer readable mediumof claim 21, wherein the lookup operation corresponds to a virtualmachine identified at the user interface.
 23. The computer readablemedium of claim 21, wherein the lookup operation corresponds to aplurality of virtual machines.
 24. The computer readable medium of claim21, wherein the lookup operation corresponds to a date range provided atthe user interface.
 25. The computer readable medium of claim 21,wherein the lookup operation corresponds to a date range provided at theuser interface.
 26. The computer readable medium of claim 21, whereinrestoring the previous version of the data object further comprisesrestoring a state of the virtual machine.